Towards a better understanding of Burrows's Delta in literary authorship attribution

نویسندگان

  • Stefan Evert
  • Thomas Proisl
  • Thorsten Vitt
  • Christof Schöch
  • Fotis Jannidis
  • Steffen Pielström
چکیده

Burrows’s Delta is the most established measure for stylometric difference in literary authorship attribution. Several improvements on the original Delta have been proposed. However, a recent empirical study showed that none of the proposed variants constitute a major improvement in terms of authorship attribution performance. With this paper, we try to improve our understanding of how and why these text distance measures work for authorship attribution. We evaluate the effects of standardization and vector normalization on the statistical distributions of features and the resulting text clustering quality. Furthermore, we explore supervised selection of discriminant words as a procedure for further improving authorship attribution.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interpreting Burrows's Delta: Geometric and Probabilistic Foundations

While Burrows’s intuitive and elegant “Delta” measure for authorship attribution has proven to be extremely useful for authorship attribution, a theoretical understanding of its operation has remained somewhat obscure. In this paper, I address this issue by introducing a geometric interpretation of Delta, which further allows us to interpret Delta as a probabilistic ranking principle. This inte...

متن کامل

Explaining Delta, or: How do distance measures for authorship attribution work?

Authorship Attribution is a research area in quantitative text analysis concerned with attributing texts of unknown or disputed authorship to their actual author based on quantitatively measured linguistic evidence (see Juola 2006; Stamatatos 2009; Koppel et al. 2009). Authorship attribution has applications in literary studies, history, forensics and many other fields, e.g. corpus stylistics (...

متن کامل

Authorship Attribution in Bengali Language

We describe Authorship Attribution of Bengali literary text. Our contributions include a new corpus of 3,000 passages written by three Bengali authors, an end-toend system for authorship classification based on character n-grams, feature selection for authorship attribution, feature ranking and analysis, and learning curve to assess the relationship between amount of training data and test accu...

متن کامل

A Supervised Authorship Attribution Framework for Bengali Language

Authorship Attribution is a long-standing problem in Natural Language Processing. Several statistical and computational methods have been used to find a solution to this problem. In this paper, we have proposed methods to deal with the authorship attribution problem in Bengali. More specifically, we proposed a supervised framework consisting of lexical and shallow features, and investigated the...

متن کامل

The Computational-Linguistic Approach to Forensic Authorship Attribution

This article examines the diversity of methods in authorship attribution through a lens which focuses attention on a single common element. The current state of authorship attribution study is spread throughout so many academic and non -academic disciplines that it is nigh impossible to describe all of the various assumptions about language and authorship. The disciplines involved in authorship...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015